CODE OUTPUT

- 팀명: ㅁㅁㅁㅉ (팀장 서승민)

Requirements

  • paths ends with '/engine'
  • pandas version >= 1.1.0
In [ ]:
# cd engine     #  if needed

함수 불러오기

In [9]:
from engine.features import *
from engine.train import *
from engine.predict import *

Train

plot과 MAPE 출력된 순서대로

  1. step 1 model(full) output - weekday case
  2. step 1 model(full) output - weekend case
  3. step 2 model(top) output - weekday case
  4. step 2 model(top) output - weekend case
  5. step 3 model(mixed) output - weekday case
  6. step 3 model(mixed) output - weekend case

결과를 보여준다.

In [10]:
run_models()
[LightGBM] [Warning] min_data_in_leaf is set=135, min_child_samples=20 will be ignored. Current value: min_data_in_leaf=135
[LightGBM] [Warning] feature_fraction is set=1, colsample_bytree=1.0 will be ignored. Current value: feature_fraction=1
MAPE of best iter is 62.440214790368145
MAE of best iter is 12049201.6181706
[LightGBM] [Warning] min_data_in_leaf is set=134, min_child_samples=20 will be ignored. Current value: min_data_in_leaf=134
[LightGBM] [Warning] feature_fraction is set=1, colsample_bytree=1.0 will be ignored. Current value: feature_fraction=1
MAPE of best iter is 59.22467182120729
MAE of best iter is 14267106.691029698
[LightGBM] [Warning] min_data_in_leaf is set=70, min_child_samples=20 will be ignored. Current value: min_data_in_leaf=70
[LightGBM] [Warning] feature_fraction is set=1, colsample_bytree=1.0 will be ignored. Current value: feature_fraction=1
MAPE of best iter is 44.71356679800993
MAE of best iter is 8586890.56807056
[LightGBM] [Warning] min_data_in_leaf is set=30, min_child_samples=20 will be ignored. Current value: min_data_in_leaf=30
[LightGBM] [Warning] feature_fraction is set=1, colsample_bytree=1.0 will be ignored. Current value: feature_fraction=1
MAPE of best iter is 59.72814198710952
MAE of best iter is 11238578.291582339
MAPE of mixed model is 58.22857240119097
MAE of mixed model is 9130783.780307008
RMSE of mixed model is 14406855.902448481
MAPE of mixed model is 54.20542972553576
MAE of mixed model is 10267380.167272182
RMSE of mixed model is 16567601.57823917

Cross-Validation

8월 기준으로 cross validation 한 결과는 아래와 같다.
Plot 및 MAPE 출력 순서는 위에 언급된 순서와 같다.

In [3]:
cross_validation([8])  
WD - CV with month 8 is starting.
[LightGBM] [Warning] min_data_in_leaf is set=135, min_child_samples=20 will be ignored. Current value: min_data_in_leaf=135
[LightGBM] [Warning] feature_fraction is set=1, colsample_bytree=1.0 will be ignored. Current value: feature_fraction=1
MAPE of best iter is 62.440214790368145
MAE of best iter is 12049201.6181706
WK - CV with month 8 is starting.
[LightGBM] [Warning] min_data_in_leaf is set=134, min_child_samples=20 will be ignored. Current value: min_data_in_leaf=134
[LightGBM] [Warning] feature_fraction is set=1, colsample_bytree=1.0 will be ignored. Current value: feature_fraction=1
MAPE of best iter is 59.22467182120729
MAE of best iter is 14267106.691029698
WD - CV for Mixed model - month 8 is starting.
WD - CV for Mixed model - top 12% is starting.
[LightGBM] [Warning] min_data_in_leaf is set=135, min_child_samples=20 will be ignored. Current value: min_data_in_leaf=135
[LightGBM] [Warning] feature_fraction is set=1, colsample_bytree=1.0 will be ignored. Current value: feature_fraction=1
MAPE of best iter is 62.440214790368145
MAE of best iter is 12049201.6181706
[LightGBM] [Warning] min_data_in_leaf is set=70, min_child_samples=20 will be ignored. Current value: min_data_in_leaf=70
[LightGBM] [Warning] feature_fraction is set=1, colsample_bytree=1.0 will be ignored. Current value: feature_fraction=1
MAPE of best iter is 33.113727097478304
MAE of best iter is 8421860.79683742
MAPE of mixed model is 58.428071704042225
MAE of mixed model is 9706826.168581635
RMSE of mixed model is 15311782.142353829
WD - CV for Mixed model - top 40% is starting.
[LightGBM] [Warning] min_data_in_leaf is set=135, min_child_samples=20 will be ignored. Current value: min_data_in_leaf=135
[LightGBM] [Warning] feature_fraction is set=1, colsample_bytree=1.0 will be ignored. Current value: feature_fraction=1
MAPE of best iter is 62.440214790368145
MAE of best iter is 12049201.6181706
[LightGBM] [Warning] min_data_in_leaf is set=70, min_child_samples=20 will be ignored. Current value: min_data_in_leaf=70
[LightGBM] [Warning] feature_fraction is set=1, colsample_bytree=1.0 will be ignored. Current value: feature_fraction=1
MAPE of best iter is 51.519009808006025
MAE of best iter is 8763454.710261866
MAPE of mixed model is 58.172488954785386
MAE of mixed model is 9026807.001682216
RMSE of mixed model is 14295391.465194885
WK - CV for Mixed model - month 8 is starting.
WK - CV for Mixed model - top 16% is starting.
[LightGBM] [Warning] min_data_in_leaf is set=134, min_child_samples=20 will be ignored. Current value: min_data_in_leaf=134
[LightGBM] [Warning] feature_fraction is set=1, colsample_bytree=1.0 will be ignored. Current value: feature_fraction=1
MAPE of best iter is 59.22467182120729
MAE of best iter is 14267106.691029698
[LightGBM] [Warning] min_data_in_leaf is set=30, min_child_samples=20 will be ignored. Current value: min_data_in_leaf=30
[LightGBM] [Warning] feature_fraction is set=1, colsample_bytree=1.0 will be ignored. Current value: feature_fraction=1
MAPE of best iter is 47.6258234726127
MAE of best iter is 9552475.97934541
MAPE of mixed model is 55.829860488071624
MAE of mixed model is 11026251.764762705
RMSE of mixed model is 17691119.340476085
WK - CV for Mixed model - top 40% is starting.
[LightGBM] [Warning] min_data_in_leaf is set=134, min_child_samples=20 will be ignored. Current value: min_data_in_leaf=134
[LightGBM] [Warning] feature_fraction is set=1, colsample_bytree=1.0 will be ignored. Current value: feature_fraction=1
MAPE of best iter is 59.22467182120729
MAE of best iter is 14267106.691029698
[LightGBM] [Warning] min_data_in_leaf is set=30, min_child_samples=20 will be ignored. Current value: min_data_in_leaf=30
[LightGBM] [Warning] feature_fraction is set=1, colsample_bytree=1.0 will be ignored. Current value: feature_fraction=1
MAPE of best iter is 77.8308144987196
MAE of best iter is 11400455.568024727
MAPE of mixed model is 55.71893534040129
MAE of mixed model is 9982359.730889853
RMSE of mixed model is 15831673.647606954

Robust Cross-Validation

In [4]:
robust_cross_validation()
[LightGBM] [Warning] min_data_in_leaf is set=135, min_child_samples=20 will be ignored. Current value: min_data_in_leaf=135
[LightGBM] [Warning] feature_fraction is set=1, colsample_bytree=1.0 will be ignored. Current value: feature_fraction=1
MAPE of best iter is 62.440214790368145
MAE of best iter is 12049201.6181706
[LightGBM] [Warning] min_data_in_leaf is set=70, min_child_samples=20 will be ignored. Current value: min_data_in_leaf=70
[LightGBM] [Warning] feature_fraction is set=1, colsample_bytree=1.0 will be ignored. Current value: feature_fraction=1
MAPE of best iter is 44.71356679800993
MAE of best iter is 8586890.56807056
MAPE of mixed model is 65.68742170441918
MAE of mixed model is 9913607.77066452
RMSE of mixed model is 15916426.363957847
[LightGBM] [Warning] min_data_in_leaf is set=134, min_child_samples=20 will be ignored. Current value: min_data_in_leaf=134
[LightGBM] [Warning] feature_fraction is set=1, colsample_bytree=1.0 will be ignored. Current value: feature_fraction=1
MAPE of best iter is 59.22467182120729
MAE of best iter is 14267106.691029698
[LightGBM] [Warning] min_data_in_leaf is set=30, min_child_samples=20 will be ignored. Current value: min_data_in_leaf=30
[LightGBM] [Warning] feature_fraction is set=1, colsample_bytree=1.0 will be ignored. Current value: feature_fraction=1
MAPE of best iter is 59.72814198710952
MAE of best iter is 11238578.291582339
MAPE of mixed model is 59.84353875890805
MAE of mixed model is 10854021.5843167
RMSE of mixed model is 18942390.734410714

Predict

In [5]:
wd_predicted, wk_predicted = predict()
In [17]:
submitted = submission(wd_predicted, wk_predicted)
print(submitted.shape)
submitted.head(7)
(2891, 8)
Out[17]:
방송일시 노출(분) 마더코드 상품코드 상품명 상품군 판매단가 취급액
0 2020-06-01 06:20:00 20.0 100650 201971 잭필드 남성 반팔셔츠 4종 의류 59800 4.388903e+06
1 2020-06-01 06:40:00 20.0 100650 201971 잭필드 남성 반팔셔츠 4종 의류 59800 9.519179e+06
2 2020-06-01 07:00:00 20.0 100650 201971 잭필드 남성 반팔셔츠 4종 의류 59800 7.853356e+06
3 2020-06-01 07:20:00 20.0 100445 202278 쿠미투니카 쿨 레이시 란쥬쉐이퍼&팬티 속옷 69900 1.910230e+07
4 2020-06-01 07:40:00 20.0 100445 202278 쿠미투니카 쿨 레이시 란쥬쉐이퍼&팬티 속옷 69900 2.821718e+07
5 2020-06-01 08:00:00 20.0 100445 202278 쿠미투니카 쿨 레이시 란쥬쉐이퍼&팬티 속옷 69900 4.161966e+07
6 2020-06-01 08:20:00 20.0 100381 201247 바비리스 퍼펙트 볼륨스타일러 이미용 59000 2.195466e+07

2020년 6월 편성표 raw data에 '취급액' 컬럼으로 취급액 예측값이 삽입되었다.